Skip to content

fix: store KeyIndex entries without exptime to allow safe flush_expired usage#13

Open
sihyeonn wants to merge 1 commit into
api7:mainfrom
sihyeonn:fix/key-index-permanent-ttl
Open

fix: store KeyIndex entries without exptime to allow safe flush_expired usage#13
sihyeonn wants to merge 1 commit into
api7:mainfrom
sihyeonn:fix/key-index-permanent-ttl

Conversation

@sihyeonn
Copy link
Copy Markdown

@sihyeonn sihyeonn commented Apr 14, 2026

Problem

When metrics are registered with an exptime (e.g. expire=300), the KeyIndex
index entries (__key:N) are stored with the same exptime via dict:add().

While metric value keys have their TTL refreshed on every incr/set call,
the KeyIndex index entries are never refreshed because prometheus.lua's
lookup_or_create() returns early from its local Lua cache on cache hit,
skipping the key_index:add() call entirely:

-- prometheus.lua lookup_or_create()
local full_name = t[LEAF_KEY]
if full_name then
    return full_name  -- cache hit: key_index:add() is never called
end

After exptime seconds, the index entry expires and gets removed by
flush_expired() — even for metrics that are still actively being recorded.
On the next metric_data() collection, sync_expired() detects the missing
index entry and removes it from memory, causing the metric to silently
disappear from Prometheus output.

Note on existing TTL refresh code

The current code already attempts to refresh TTL for existing keys (lines 118-133):

if self.index[key] ~= nil then
    if exptime then
        local ok = self.dict:expire(...)
    end
end

However, this code is unreachable for active metrics because lookup_or_create()'s
local cache prevents key_index:add() from being called in the first place.

Fix

Store KeyIndex entries with nil exptime (permanent). Since dict:add() only
succeeds once per key (returns 'exists' error if the key already exists),
changing exptime to nil here means the index entry is created permanently,
while metric value keys still expire normally via incr/set.

flush_expired() will then only remove expired metric value entries, leaving
the index intact — making it safe to call flush_expired() periodically
to reclaim nginx slab memory from expired metrics.

Reproduction

  1. Register a metric with exptime=300
  2. Send requests so the metric is actively recorded
  3. Wait 300 seconds without restarting nginx
  4. Call ngx.shared["prometheus-metrics"]:flush_expired()
  5. Observe: the metric disappears from /metrics output despite still being active

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Fixed key expiration handling to ensure proper TTL management when adding new cache entries.

…ed usage

When metrics are registered with an exptime (e.g. expire=300), the KeyIndex
index entries (__key:N) were also stored with the same exptime via dict:add().

However, while metric value keys have their TTL refreshed on every incr/set call,
the KeyIndex index entries are NOT refreshed because prometheus.lua's
lookup_or_create() returns early from its local Lua cache on cache hit,
skipping the key_index:add() call entirely.

This means that after 'exptime' seconds, the index entry expires and gets
removed by flush_expired() — even for metrics that are still actively being
recorded. On the next metric_data() collection, sync_expired() detects the
missing index entry and removes it from memory, causing the metric to silently
disappear from Prometheus output.

Fix: store KeyIndex entries with nil exptime (permanent). Since dict:add()
only succeeds once per key (returns 'exists' error if key exists), setting
nil here means the index entry is created permanently, while metric value
keys still expire normally. flush_expired() will then only remove expired
metric value entries, leaving the index intact.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

Modified the KeyIndex:add() method in prometheus_keys.lua to explicitly pass nil as the TTL argument when adding new keys to ngx.shared.dict, instead of passing through the exptime parameter. Conditional expiration logic remains unchanged.

Changes

Cohort / File(s) Summary
Key Expiration Logic
prometheus_keys.lua
Modified KeyIndex:add() to use explicit nil TTL when adding keys to shared dictionary, while preserving subsequent expiration marking and cleanup logic.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
E2e Test Quality Review ⚠️ Warning PR introduces a critical TTL/expiration bug fix but provides no E2E tests to validate it; repository has no test infrastructure. Add E2E tests validating metrics remain in output after TTL expiration and flush_expired() calls; establish test framework and CI/CD integration.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: storing KeyIndex entries without exptime to enable safe flush_expired usage, which directly addresses the bug being fixed.
Security Check ✅ Passed The PR modifies TTL handling in prometheus_keys.lua without introducing security vulnerabilities across authentication, authorization, credential handling, data exposure, or resource isolation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
prometheus_keys.lua (1)

141-143: Minor inconsistency: marking expire_keys for permanent entries.

This code marks expire_keys[N] = true when exptime > 0, but the index entry is now created with no TTL (permanent). This means:

  1. remove_expired_keys() will iterate over these keys
  2. dict:ttl() returns 0 for permanent keys
  3. The condition ttl >= 0 is true, so keys are never removed

While not incorrect (the code is defensive), it causes unnecessary iteration in remove_expired_keys() over keys that can never expire. Additionally, there's an inconsistency with sync_range() at lines 73-79, which uses ttl ~= 0 to decide whether to mark expire_keys[i]—permanent keys will not be marked during sync.

This is minor since the functional behavior is correct, but you could consider either:

  • Not marking expire_keys for permanent index entries (remove lines 141-143)
  • Or accepting this as acceptable overhead for tracking metrics that should have expiration semantics
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prometheus_keys.lua` around lines 141 - 143, The loop that sets
self.expire_keys[N] = true when exptime > 0 should avoid marking permanent index
entries (exptime == 0) to prevent needless work in remove_expired_keys(); update
the logic around exptime and the creation of the index entry (the exptime
variable and assignment to self.expire_keys[N]) so it only marks expire_keys for
entries with a nonzero TTL (consistent with sync_range which checks ttl ~= 0),
and ensure remove_expired_keys, dict:ttl checks, and the index creation paths (N
/ i usage) remain consistent after this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@prometheus_keys.lua`:
- Around line 141-143: The loop that sets self.expire_keys[N] = true when
exptime > 0 should avoid marking permanent index entries (exptime == 0) to
prevent needless work in remove_expired_keys(); update the logic around exptime
and the creation of the index entry (the exptime variable and assignment to
self.expire_keys[N]) so it only marks expire_keys for entries with a nonzero TTL
(consistent with sync_range which checks ttl ~= 0), and ensure
remove_expired_keys, dict:ttl checks, and the index creation paths (N / i usage)
remain consistent after this change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b23dad04-3911-434d-9a05-a013654f03ca

📥 Commits

Reviewing files that changed from the base of the PR and between f86aa3e and 06a1151.

📒 Files selected for processing (1)
  • prometheus_keys.lua

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants